Action Symbols and Action Routines:

Generating Code for Assignment Statements


An Example

Start with the Grammar

Just as we did when we were including semantic action calls for constructing the symbol table, to determine where we have enough information to generate IR when parsing, we look at the grammar we based our parser on.  For example, part of this grammar might look like the following (note:  this is not mPascal!).

 <statement> ⟶ <ident> := <expression> ;
 <statement> ⟶ read ( <id_list> ) ;
 <statement> ⟶ write ( <expression_list> ) ;
     .     .     .
 <ident>     ⟶ id

Recall how we incorporate semantic information into our compiler:  every nonterminal and token potentially has a semantic record associated with it.  Of course, in our compilers this means that we must declare such records and pass them as parameters to the different routines as necessary in order to get these records filled with the proper semantic information.

Add Semantic Actions

If we look at the above grammar fragment we know that we will have a single procedure named Statement in our recursive descent parser to handle all of the rules that begin with <statement> on the left.  In the rule


  <statement> ⟶ <ident> := <expression>
                         ^
    

we can think of where we might be in parsing this rule at any time by including a marker (^) at various points in the rule.  In this instance, the marker indicates that we are at the point in parsing this rule where we have called procedure Ident and returned successfully, and that we are now about to match the := operator.  Is this a point at which we should know enough information to begin outputting IR to accomplish this assignment statement?

It takes practice to know where to put semantic action calls.  However, once you've done a few it gets easier to decide where the action calls go.  In this rule, the only semantic action should likely go at the end, just before the semicolon, because there we should have all the information we need to generate the code for an assignment statement:

<statement> ⟶ <ident> := <expression>

At this point we have called and returned from Ident, so we should now know what the identifier is that is to receive the assignment, and we have called and returned from Expression, so we should know what gets assigned to the identifier on the left hand side.  Where is this information?  Well, the semantic record for <ident> should have in it something that lets us determine the lexeme for the identifier and its attributes.  This could be a pointer to the symbol table entry for the lexeme, or it could simply be the lexeme itself (so that the lexeme could be looked up in the symbol table later).

Suppose the assignment statement being parsed is

x := a * (b + c)/10;

What should the semantic record for <expression> contain?  The expression expanded beneath <expression> is quite complicated:

a * (b + c)/10

By the time we finish the expansion of <expression> and return (in a recursive descent parser this would be akin to calling procedure expression and then getting back from that call), our compiler will have generated the code for the entire expression

a * (b + c)/10

The result will be stored in some temporary location in a register-based translation or will be left on top of the stack in a stack based implementation.  In either case, this temporary value has some type.  Information about this type needs to be in the expression semantic record for the stack based approach and, in addition to the type, information about where the temporary value is stored must be kept in the expression semantic record for the register based approach.  It is also possible that this information about the temporary value could be kept in the symbol table, and the symbol table could be extended for each temporary value generated.

Our action would be shown in the grammar as

<statement> ⟶ <ident> := <expression> #gen_assign(ident_rec, expression_rec) ;               

we use the # to show that the new information inserted into the grammar is not a non terminal or terminal, but rather an action symbol that will translate into an action call to the semantic analyzer in our parser.

The name gen_assign is supposed to convey the meaning of this particular action, namely, that it is to generate code for an assignment statement.  In order to do this, the gen_assign method will need the ident_rec and expression_rec semantic records.

Similarly, for the single rule with <ident> on the left, after having matched id, we know enough information to process the found id.  For example, the action might be to check whether the id is in the symbol table and, if so, include a pointer to it in the semantic record for ident (we will need to pass this value back up to where ident was called from, so the information about the id should be placed in ident_rec, which we will assume is a parameter passed to procedure Ident):

<ident>     ⟶ id #process_id(ident_rec)

Modify the Parser

To see how this would work in the parser, look at the following code for procedure Statement and procedure Ident in our parser:


procedure statement
begin
  case scanner.lookahead is
     when id ==>
       ident;
       match(assign_op);
       expression;
       match(semicolon);
     when read ==>
       match(read);
       match(l_paren);
       id_list;
       match(r_paren);
       match(semicolon);
    when write ==>
       match(write);
       match(l_paren);
       expression_list;
       match(r_paren);
       match(semicolon);
     when others ==>
       Syntax_Error("Expecting id, read, or write.  Found",
                     scanner.lookahead);
     end case;
   end statement;
   
   
   procedure ident
      begin ident
   case scanner.lookahead is
     when id ==>
       match(id);
     when others ==>
        Syntax_Error("Expecting id. Found ",
                      scanner.lookahead);
     end case;
 end ident;
 

To modify procedure Statement, we need to declare semantic records for nonterminal <ident> and nonterminal <expression>.  We need to pass these records to procedures Ident and Expression, respectively, where they will be filled up and returned.  We can then pass the filled records to the new action routine Gen_Assign in the semantic analyzer to generate the code.  This give us the following modifications to the two routines described above.


  procedure statement
    ident_rec      : semantic_record;
    expression_rec : semantic_record;

  begin
   case scanner.lookahead is
     when id ==>
       ident(ident_rec);
       match(assign_op);
       expression(expression_rec);
       analyzer.gen_assign(ident_rec, expression_rec);
       match(semicolon);
     when read ==>
       match(read);
       match(l_paren);
       id_list;
       match(r_paren);
       match(semicolon);
     when write ==>
       match(write);
       match(l_paren);
       expression_list;
       match(r_paren);
       match(semicolon);
     when others ==>
       Syntax_Error("Expecting id, read, or write.  Found", scanner.lookahead);
     end case;
   end statement;


  procedure ident(ident_rec: out semantic_record);
  
    begin ident
      case scanner.lookahead is
        when id ==>
          match(id, ident_rec);
        when others ==>
          Syntax_Error("Expecting id. Found ", scanner.lookahead);
     end case;
   end ident;
 

Handling the Read and Write Statements

At this point, you might be wondering why we didn't put any semantic actions or records in the Read and Write rules for <statement>.  The reason is that the reading and writing translations will most likely be done when the calls to id_list (for the read statement) and expression_list for the write statement are made.  That is, as each id in the id list is matched in a Read statement (by the call to id_list) the code to read that id will be generated right then, so that there is nothing left to do upon return from id_list. A similar argument holds for the write statement.

A problem could come up if idlist can appear in more than one place in the grammar.  That is, if an idlist could appear in a variable declaration and in a read Read statement, once idlist is called, it wouldn't know where it was called from, so it would not know whether to generate a read for each variable in the list or not.  This problem could be fixed in a number of ways.  One way is to have a different nonterminals for the different lists, such as readidlist and varidlist.  Each of those could parse an idlist the same, but would be different procedures, so thus would have different sematic actions as appropriate.

Finally, in any real programming language that is being translated into the native code of some computer, the compiler would actually need to generate operating system calls to perform the read and the write, as this task cannot be done by the user in user mode.  This is one reason why a compiler must be different for different operating systems, even on the same computer.

Write the Necessary Action Routines in the Semantic Analyzer

Once the parser has been modified, we must design and write the code to implement any new action calls we inserted into the parser.  In our example this means that we need methods called gen_assign and process_id.  Rather than try to show all of the code in this example, we will just describe what the code must do.


  gen_assign(left_op, right_op : in semantic_record);
    begin
      -- Determine the types of left_op and right_op are assignment compatible
      -- if necessary, generate code to cast the type of right_op
      --   to the type of left_op or generate an error if their types are
      --   assignment incompatible
      -- if there are no errors, generate the code necessary
      --   to assign right_op to left_op, otherwise generate
      --   an error (e.g., "operands not type compatible," along with
      --    the line and column numbers.
      
      -- output for an architecture that uses registers for arithmetic
      output(IR_File, "load r1," , right_op.operand);
      -- alternate output for stack-based arithmetic architecture
      output(IR_File, "pop ", left_op.operand
    end gen_assign
    

The result of running the semantic action gen_assign would be that the two output statements would be executed at the end of that procedure, which would print something like


load  r1, 24(D0)
store r1, 8(D0)

to the IR_File output file.  Here we are assuming that 24(D0)  is the location on the run time stack where the temporary value is stored (as determined by the information in the expression semantic record, and thus stored in left_op.operand) as the result of code produced by the Expression routine, and that (8)D0 is the location on the run time stack of the identifier on the left hand side of the assignment statement (the value in the semantic record, ident_rec). 

The alternate output for a stack based architecture would instead produce the code


pop 8(D0)
   

where it is assumed that the result of the expression will be on top of the stack and the location of the left hand side variable is at location D0 offset 8 on the run time stack.

To finish this example, suppose that we had been parsing the assignment statement


 Fred := a + b;
    

Suppose further that the symbol table at this point looks like:

Table Name Nesting Level Activation Record Size Link to Next Table
bunko 0 20 null
lexeme kind type size (bytes) offset
Fred var int 4 0
a var int 4 4
b var int 4 8
. . . . . . . . . . . . . . .
t1 temporary int 4 16

(Note:  as we have discussed, the information about temporary variables could be stored in the symbol table as shown in this symbol table, or it could actually be in the expression semantic record instead.)

Then the call to Expression (and the routines it further calls) would have generated lines of code similar to the following for a register-based arithmetic archtecture:


load r1, 4(D0)   -- load register 1 with a's value
load r2, 8(D0)   -- load register 2 with b'x value 
add r1, r2, r3   -- add register 1 to register 2 and store the result in register 3
store r3, 16(D0) -- store the value in regiser 3 into a temporary location, also on
                 -- the run time stack (the symbol table must be expanded for 
                 -- temporaries)
  

After returning from the call to Expression, the routine gen_assign is called which then adds the two lines of code in blue to the IR file, giving, finally,


load r1, 4(D0)   -- load register 1 with a's value
load r2, 8(D0)   -- load register 2 with b'x value
add r1, r2, r3   -- add register 1 to register 2 and store the result in register 3
store r3, 16(D0) -- store the value in regiser 3 into a temporary location, also on
                 -- the run time stack
load  r1, 16(D0) -- load register 1 with the temporary value stored at 16(D0)
store r1, 0(D0)  -- store register 1 into Fred

For a stack-based arithmetic architecture there are no temporary variables, because intermediate values of an arithmetic computation remain on the stack.  Thus, all we need to do is have the expression semantic record describe the necessary attributes (e.g. type) of the intermediate value on top of the stack at that point. The symbol table in this case might look like

Table Name Nesting Level Activation Record Size Link to Next Table
bunko 0 12 null
lexeme kind type size (bytes) offset
Fred var int 4 0
a var int 4 4
b var int 4 8

The generated code produced by calls to the semantic analyzer would look something like the following.


    push 4(D0)  -- push a's value onto the stack
    push 8(D0)  -- push b's value onto the stack
    adds        -- add the top two elements of the stack
    pop  0(D0)  -- pop the result into Fred
    

Type Casting

It is important to remember that in a real compiler, the semantic analyzer would need to check the semantic records of any two operands to see if they are type compatible with the operator.  If they are not type compatible, but the language allows one type to be cast to another type for the operator involved, the semantic analyzer must generate code that changes one of the operands into the type of the other operand according to the rules of the language.  If no type cast is possible, the analyzer must report an error.

In your project you must also pay attention to types.  Even though microPascal only allows for integer variables in arithmetic expressions, Boolean expressions must result in a type Boolean.  You cannot assign a Boolean value to an integer variable, or perform arithmetic on Boolean values in microPascal.  This means that you must keep track of the type of intermediate expressions in the expression semantic records.